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Abstract — We describe a new algorithm, termed subspace evo- 
lution and transfer (SET), for solving low-rank matrix completion 
problems. The algorithm takes as its input a subset of entries of 
a low-rank matrix, and outputs one low-rank matrix consistent 
with the given observations. The completion task is accomplished 
by searching for a column space on the Grassmann manifold 
that matches the incomplete observations. The SET algorithm 
consists of two parts - subspace evolution and subspace transfer. 
In the evolution part, we use a gradient descent method on 
the Grassmann manifold to refine our estimate of the column 
space. Since the gradient descent algorithm is not guaranteed 
to converge, due to the existence of barriers along the search 
path, we design a new mechanism for detecting barriers and 
transferring the estimated column space across the barriers. 
This mechanism constitutes the core of the transfer step of 
the algorithm. The SET algorithm exhibits excellent empirical 
performance for both high and low sampling rate regimes. 

Index Terms — Grassmann manifold, linear subspace, matrix 
completion, non-convex optimization. 

I. Introduction 

Suppose that we observe a subset of entries of a matrix. The 
matrix completion problem asks when and how the matrix can 
be recovered based on the observed entries. In general, this re- 
construction task is ill-posed and computationally intractable. 
However, if the data matrix is known to have low-rank, exact 
recovery can be accomplished in an efficient manner with 
high probability, provided that sufficiently many entries are 
revealed. Low-rank matrix completion problems have received 
considerable interests due to their wide applications, ranging 
from collaborative filtering (the NETFLIX challenge) to sensor 
network tomography. For an overview of these applications, 
the reader is referred to |[T1. 

An efficient way to solve the completion problem is via 
convex relaxation. Instead of looking at rank-restricted ma- 
trices, one can search for a matrix with minimum nuclear 
norm, subject to data consistency constraints. Although in 
general nuclear norm minimization is not equivalent to rank 
minimization, the former approach recovers the same solution 
as the latter if the data matrix satisfies certain incoherence 
conditions [2]. More importantly, nuclear norm minimization 
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can be accomplished in polynomial time by using semi- 
definite programming, singular value thresholding (SVT) |3l, 
or methods adapted from robust principal component analysis 

Several low-complexity alternatives to nuclear norm min- 
imization have been proposed so far Realizing the intimate 
relationship between compressive sensing and low-rank matrix 
completion, a few approaches for low-rank completion can 
be viewed as generalization of those for compressive sensing 
reconstruction. In particular, the ADMiRA algorithm ||5l is a 
counterpart of the subspace pursuit (SP) |6| and CoSaMP Q 
algorithms, while the singular value projection (SVP) method 
|8| extends the iterative hard thresholding (IHT) (9) approach. 
There are other approaches that rely more on the specific 
structures of the low-rank matrices. The power factorization 
algorithm described in |10| takes an alternating optimization 
approach. In the OptSpace algorithm described in |11J, a 
simultaneous optimization on both column and row spaces is 
employed. 

We address a more general class of problems in low- 
rank matrix completion - consistent completion. Consistent 
completion extends the previous completion framework in 
that it does not require the existence of a unique solution 
to the problem. This extension seems questionable at first 
glance - in highly undersampled observation regimes, there 
may exist many low-rank matrices that match the observations 
- which makes the final result have less practical value. 
Nevertheless, the consistent completion paradigm allows for 
identifying convergence problems with standard completion 
techniques, and it does not require any additional structure 
on the matrix, such as incoherence. Furthermore, as will be 
shown in the subsequent exposition, when confronted with 
very sparsely sampled matrices all methods known so far 
fail to produce any solution to the problem, despite the fact 
that many exist. Finally, even in the sampling regime for 
which SVT, OptSpace and other techniques have provable, 
unique reconstruction performance guarantees, the consistent 
completion technique described in this contribution exhibits 
significantly better results. 

To solve the consistent matrix completion problem, we 
propose a novel subspace evolution and transfer (SET) method. 
We show that the matrix completion problem can be solved by 
searching for a column space (or, alternatively, for a row space) 
that matches the observations. As a result, optimization on the 
Grassmann manifold, i.e., subspace evolution, plays a central 
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role in the algorithm. However, there may exist "barriers" 
along the search path that prevent subspace evolution from 
converging to a global optimum. To address this problem, in 
the subspace transfer part, we design mechanisms to detect 
and cross barriers. The SET algorithm improves the recovery 
performance not only in high sampling rate regime but also 
in low sampling rate regime where there may exist many 
low-rank solutions. Empirical simulations demonstrate the 
excellent performance of the proposed algorithm. 

The SET algorithm employs a similar approach as that of the 
OptSpace algorithm (TT] in terms of using optimization over 
Grassmann manifolds. Still, the SET approach substantially 
differs from the method supporting OptSpace ifTTl . Searching 
over only one space (column or row space) represents one 
of the most significant differences: in OptSpace, one searches 
both column and row spaces simultaneously, which introduces 
numerical and analytical difficulties. Moreover, when optimiz- 
ing over the column space, one has to take care of "barriers" 
that prevent the search procedure from converging to a global 
optimum, an issue that was not addressed before since it was 
obscured by simultaneous column and row space searches. 

The paper is organized as follows. In Section|II]we introduce 
the consistent low-rank completion problem, and describe the 
terminology used throughout the paper In Section |III] we 
outline the steps of the SET algorithm. Simulation results are 
presented in Section |IV] All proofs are listed in the Appendix 
sections. 

II. Consistent Matrix Completion 

Let X G M™^" be an unknown matrix with rank r ^ 
min (m, n), and let C [m] x [n] be the set of indices of 
the observed entries, where [K] — {1, 2, • • • , K}. Define the 
projection operator Vn by 

P„(X)^X„, Where (X„),, = {f- ^ ^-^f^ 

The consistent matrix completion problem is to find one rank-r 
matrix X' that is consistent with the observations Xq,, i.e., 

(PO) : find a X' such that 

rank {X') < r and Vn (X') = Vn {X) = Xn- (1) 

This problem is well defined as all our instances of Xq^ 
are generated from matrices X with rank r and therefore 
there must exist at least one solution. Here, like in other 
approaches Q, IfTOl , IfTTl . we assume that the rank r is given. 
In practice, one may try to sequentially guess a rank bound 
until a satisfactory solution has been found. 

We also introduce the (standard) projection operator V, 

V{x.,U)^V = UU^x, 

where 1 < k < m, and where the superscript f denotes 
the pseudoinverse of a matrix. That is, V {x,U) gives the 
projection of the vector x on the hyperplane spanned by the 
matrix U, i.e., span(C/). It should be observed that U^x is 
the global minimizer of the quadratic optimization problem 
mini„gR/= \\x - Uw\\p . 



A. Why optimizing over column spaces only? 

In this section, we show that the problem (PO) is equivalent 
to finding a column space consistent with the observations. 

Let Um.r be the set of m x r matrices with r orthonormal 
columns, i.e., U^^r {C/ e R™^'' : U^U = Ir) ■ Define a 
function 

f{U)^ ^mm^JXn-rn{UW^)\\l, (2) 

where denotes the Frobenius norm. The function / 

captures the consistency between the matrix U and the obser- 
vations Xu '. if / (U) = 0, then there exists a matrix W such 
that the rank-r matrix UW'^ satisfies Vn (UW'^) = Xq. 
Hence, the consistent matrix completion problem is equivalent 
to 

(PI) : find (7 e U,n^r such that / (U) = 0. (3) 

An important property of the objective function / is that / 
is invariant under rotations. More precisely, / (U) = / {UV) 
for any r-by-?- orthogonal matrix V G Ur.r- This can be easily 
verified, as UW^ = (UV) {WVf. Hence, the function 
/ depends only on the subspace spanned by the columns of 
U, i.e., the span(C/). Note that all columns of the matrix 
of the form UW'^ lie in the linear subspace span(C/). The 
consistent matrix completion problem essentially reduces to 
finding a column space consistent with the observed entries. 
Note that instead of identifying the column space in which the 
observations lie, one can also use the row space instead. All 
results and the problem formulation remain valid in this case as 
well. Which space to search over will depend on the dimension 
of the matrix, and the particular sampling pattern (which 
determines the density of rows and columns of the matrix). 
In addition, one can run in parallel two search procedures - 
one on the column space, the other on the row space. Here, we 
only focus on the simplest scenario, and restrict our attention 
to column spaces. 

B. Grassmann manifolds and geodesies 

We find the following definitions useful for the exposition 
to follow. The Grassmann manifold Qm^r is the set of all r- 
dimensional linear subspaces (hyperplanes through the origin) 
in R", i.e., Gm,r = {span (U) : U G Um,r}- Given a subspace 
^ G Gm,r, one can always find a matrix U G Um.r, such that 
= span(C/). The matrix U is referred to as a generator 
matrix of and the columns of U are often referred to as 
an orthonormal basis of Since span ([/) — span(i7V) 
for all V E Ur,r, it is clear that the generator matrix for a 
given subspace is not unique. Nevertheless, a given matrix 
U S Um,r uniquely defines a subspace. For this reason, we 
henceforth use U to represent its induced subspace. 

To search for a consistent column space, we use a gradient 
descent method on the Grassmann manifold. For this purpose, 
we introduce the notion of a geodesic curve in the Grassmann 
manifold. Roughly speaking, a geodesic curve is an analogue 
of a straight line in an Euclidean space: given two points on 
the manifold, the geodesic curve connecting them is the path 
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of the shortest length in the manifold. Let U {t) be a geodesic 
curve (parametrized by i e M) in the Grassmann manifold. 
Denote the starting point of this geodesic curve by U (0) 
U e Urn.r, and the direction by f/ (0) = if G R'"-''. Let 
H — UhShVh be the compact singular value decomposition 
of H, and let si, • • • ,Sr denote the singular values of H in 
descending order. Then the corresponding geodesic curve is 
given by ||T21 



U{t) = [UVhMh] 



cos St 
sin St 



V, 



H 1 



(4) 



where cos St e R''^'' and sin St G R''^'' are r x r diagonal 
matrices with diagonal entries cos(sit),--- , cos (srt) and 
sin (sit) , sin (s^t), respectively. 

When H has rank one, i.e., S2 = S3 = •••=: s, ^ 0, 
the equation for the geodesic curve has a particularly simple 
form. In this case, let Ui, -- ,Ur be the columns of the 
matrix [/V^Q Let h G Vlm,i be the left singular vector of 
H corresponding to the largest singular value. After a change 
of variables, the geodesic curve can be written a^ 

U {t) — [uicost + hsint,U2,- ■ ■ ^Ur] , te[0,7r). (5) 

Here, the range of values for the parameter t is restricted to 
[0,7r), since 

span {U [t + tt)) = span ([— Mi cost — /isint, M2, • • • , Mr]) 
= span(J7(t)), 

and therefore span {U (<)) is a periodic function with period 



in. The set Algorithm - A Two Step Procedure 

A. The SET algorithm: a high level description 

Our algorithm aims to minimize the objective function 
f {U). The basic component is a gradient search approach: 
for a given estimate U, we search in the gradient descent 
direction for a minimizer. This part of the algorithm is referred 
to as "subspace evolution". The details are presented in Section 

imis 

The main difficulty that arises during the gradient descent 
search, and makes the SET algorithm highly non-trivial, is 
when one encounters "barriers". Careful inspection reveals that 
the objective function / can be decomposed into a sum of 
atomic functions, each of which involves only one column of 
Xq (see Section ITlI-CI for details). Along the gradient descent 
path, the individual atomic functions may imply different 
search directions: some of the functions may decrease and 
some others may increase in the same dkection. The increases 
of some atomic functions may result in "bumps" in the / 
curve, which block the search procedure from reaching a 
global optima and are therefore referred to as barriers. The 
main component of the "transfer" part of the SET algorithm 

'Note that span {U) = span (C/V/j)- The starting point (in the Grassmann 
manifold) does not change. 

-Again, although the matrix U (t) in (3) and the matrix U (t) in @ may 
be different, both matrices generate the same hypeiplane in the Grassmann 
manifold Qm,r- Therefore, Equations ^4) and ^5) describe the same geodesic 



is to identify whether there exist barriers along the gradient 
descent path. Detecting barriers is in general a very difficult 
task, since one obviously does not know the locations of global 
minima. Nevertheless, we observe that barriers can be detected 
by the existence of atomic functions with inconsistent descent 
directions. Such an inconsistence can be seen as an indicator 
for the existence of a barrier When a barrier is expected, the 
algorithm "transfers" the current point of the line search - i.e., 
its corresponding space - to the other side of the barrier, and 
proceeds with the search from that point. Such a transfer does 
not overshoot global minima as we enforce consistency of the 
steepest descent directions at the points before and after the 
transfer. The details of barrier detection and subspace transfer 
ai-e presented in Sections UlLCl UlLDl UlLEl and HlLFl 

The major steps of the SET algorithm are given in Algo- 
rithm [U Here, we introduce an error tolerance parameter eg > 



0. The stopping criterion is given by \\Xf, 



< 

£e where X' denotes the estimated low-rank matrix. 

In our simulations, we set — 10~^. The SET algorithm 
described below only searches for an optimal column space, 
represented by U. Other modifications are possible, as already 
pointed out. For example, to speed up the process, one 
may alternatively optimize over U and V (representing the 
column and row spaces, respectively). These extensions are 
not described in the manuscript. 

Algorithm 1 The SET algorithm 
Input: Xfj, fi, r and 6^. 
Output: X'. 

Initialization: Randomly generate a U G Um,r- 
Steps: Execute the following steps iteratively: 

1) Perform subspace transfer algorithm described in Algo- 
rithm [3] 

2) Perform subspace evolution algorithm described in Al- 
gorithm |2] 

3) According to (|2]) find the optimal Wu and set X' — 
UWu. If \\Xn - Vn {X')\\l < ee \\Xn\\l, output X' 
and quit. Otherwise, go to Step 1). 



B. Subspace evolution 

For the optimization problem at hand, we refine the current 
column space estimate U using a gradient descent method. 
For a given U G Um^r, it is straightforward to solve the least 
square problem 



min \\Xn-Vn{UW)fp 



(6) 



Denote the optimal solution by Wjj- Let Xr ~ X^ — 
Vn {UWu) be the residual matrix. Then the gradienj^ of / 
at U is given by 



Vt// - -2XrW^. 



(7) 



The proof of this claim is given in Appendix [A] The gradient 
Vt// gives the direction along which the objective function / 

'The gradient is well defined almost everywhere in Um.,r- 
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increases the fastest. In classical gradient descent methods, 
the search path direction is opposite to the gradient, i.e., 
— Vu/. In order to make the search step more suitable for 
the transfer step, we choose the search direction as follows. 
Consider the singular value decomposition of the matrix Vj//. 
Let h e Um,i and v £ Ur^i be the left and right singular 
vectors corresponding to the largest singular value of Vu/Q 
Then the search direction is defined as 



Algorithm 2 Subspace evolution. 



H 



-hv' 



(8) 



It can be easily verified that if Vt// 7^ then {H, Vuf) — 
trace (^H^\/uf) < 0, and therefore the objective function 
decreases along the direction of H. The geodesic curve 
starting from U and pointing along H can be computed via 

The subspace evolution part is designed to search for a 
"neighboring minimizer" of the function / along the geodesic 
curve. It is an analogue of the line search procedure in 
Euclidean space. Its continuous counterpart consists of moving 
the estimate U continuously along the direction H until the 
objective function stops decreasing. For computer simulations, 
one has to discretize the continuous counterpart. Our imple- 
mentation includes two steps. Let t* denote the neighboring 
minimizer along the geodesic curve. The goal of the first 
step is to identify an upper bound on t* , denoted by imax- 
Since / {t) is periodic with period tt, tmax is upper bounded 
by TT. The second step is devoted to locating the minimizer 
t* e [0,tniax] accurately by iteratively applying the golden 
section rule |13|. These two steps are described in Algorithm 
|2l The constants are set to e = 10~^, ci — (a/5 — l) /2, 
C2 = Cl / (1 — Cl) and itN = 10. Note that our discretized 
implementation is not optimized with respect to its continuous 
counterpart, but is sufficiently accurate in practice. 



C. Subspace transfer 

Unfortunately, the objective function / ([/) is typically 
not a convex function of U. The described linear search 
procedure may not converge to a global minimum because 
the search path may be blocked by what we call "barriers". 
In subsequent subsections, we show how "barriers" arise in 
matrix completion problems and how to overcome the problem 
introduced by barriers. 

At this point, we formally introduce the decoupling princi- 
ple. This principle is essential in understanding the behavior 
of the objective function. It implies that the objective function 
f [U [t)) can be decoupled into a sum of atomic functions, 
each of which is relatively simple to analyze. Specifically, the 
objective function f {U (t)) is the squared Frobenius norm 
of the residue matrix; it can be decomposed into a sum of 
the squared Frobenius norms of the residue columns. Let 
e M™^^ be the j*'' column of the matrix Xq. Let Vcij 
be the projection operator corresponding to the j*'' column, 

"^With probability one, the largest singular value is strictly positive and 
distinct from other singular values. 



Input: Xq, n, U, and UN. 
Output: t* and U (t*). 

Initialization: Compute the gradient and the search direction 
according to (|7| and ([8]) respectively. The geodesic curve U {t) 
along the search direction can be computed via (|5]l. 
Step A: find t^ax < tt such that t* e [0,iinax] 
Let t' = en. 

1) Let t" = C2 • t'. If t" > TT, then i^ax = tt. Quit Step A. 

2) If / [U it")) > f{U (t)), then W = t". Quit Step A. 

3) Otherwise, t' = t". Go back to step 1). 
Step B: numerically search for t* in [0,tmax]- 

Let ti = tnmx/cl, t2 = imax/c2, ti = tniax, and <3 = tl + 

Cl (^4 — ti). Let itn ~ 1. Perform the following iterations. 

1) If / {U (ii)) > f{U ih)) > fiU ih)), then ii = t2, 
t2 = t3, and ts ^ti + Cl (^4 - ti). 

2) Else, — ^3, ^3 — t2 and ^2 = + (1 ^ ci) (^4 — ti). 

3) itn — itn + 1. If itn > itN, quit the iterations. 
Otherwise, go back to step 1). 

Let t* = argmin f (U (t)) and compute U (t*). 



defined by 



m . mm 



'Pn.iv) ^-> ■uo,, where (vn^). = 



v.i if e 
ifii,j)(^n 



(9) 



Then the objective function / {u (t)) can be written as a sum 
of n atomic functions: 

f{U{t))= min \\Xn-rn{U{t)W)\\l 

n 



fAU{t)) 

where W-j is the j*'' column of the matrix W. This de- 
coupling principle can be easily verified by the additivity of 
the squared Frobenius norm. A formal proof is presented in 
Appendix |B] 

We study atomic functions along the geodesic curve in a 
rank-one direction (|5]l and summarize their typical behavior 
in the following proposition. 

Proposition 1: Let U (t) be of the form in (|5]). Given a 
vector X E M™ and an index set il C [m], consider the 
function 



U.n{U{t))= min \\xn - Vn [U [t) w)\\ 



(11) 



Then either one of the following two claims holds. 

1) The function f^.n (U [t)) is a constant function. 

2) The function f^ Q [U (t)) is periodic, with period tt. It 
has a unique minimizer, <,nin G [0, tt), and a unique 
maximizer, tj^nx G [0,7r). 

The proof is given in Appendix |D] and the computations of 
tynin and tmax are detailed in Section UlI-FI 
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D. Barrier - an illustration 

We use the following example to illustrate the concept of 
a barrier. Consider an incomplete observation of a rank-one 
matrix 

? 2 1 
Xn= 3 ? 1 
3 2 ? 

where question marks denote that the corresponding entries 
are unknown. It is clear that the objective function / {U (t)) 
is minimized by Ux ~ -i= [1, 17 1]"'", i-S-, / (Ux) — and the 
recovered matrix equals X — [1, 1, 1] • [3, 2, 1]. Let us study 
one of the atomic functions, say /i {U). For any U E U^^i 

of the form [Vl - 2e2, e, e] with e G [-1/V2, 1/V2] \ {0}, 
one has 



fi(U) 







" " 




" " 




min 




3 




e 


w 






3 




e 





Similarly, For any U of the form — 2e^, e, — e] with 
e e [-l/\/2, l/\/2|, one has 







" " 









min 




3 




e 


w 


















3 




— e 





18. 



As a result, 

/i (t/) - 0, if C/2 - C/3 ^ 0; 
/i ([/) = 18, if (72 - -U3. 

This gives us the two contours depicted in Fig. [Ta] (projected 
on the plane spanned by U2 and U3, the second and the 
third entries of the vector U respectively). Suppose that one 
starts with the initial guess U (0) = [—10, 1, 1]"^. Then 

/ (U (0)) = Eti f^ (U (0)) < + 8 + 2 = 10. On the 



other hand, for any U in the preimage of /i (U) — 18, 
one has / (C/) > 18 > 10 > / [U (0)). As a result, any 
gradient descent method (continuous version) can not lead the 
estimate U {t) to cross the contour {U : fi (U) — 18}. That 
is, the contour fi — 18 forms a "barrier" for the line search 
procedure. A more careful analysis reveals that the objective 
function / is not continuous at the point U — [1,0,0] . Our 
extensive simulations suggest that a gradient descent procedure 
is typically trapped towards these singular points. See Fig. [lb] 
for an illustration of this phenomenon. 

E. Barrier Detection and Subspace Transfer 

We describe a heuristic procedure for detecting barriers and 
transferring the current estimate U from one side of a barrier 
to the other side. 

The intuition behind barrier detection is as follows. Recall 
that every atomic function is periodic and has a unique min- 
imizer and maximizer in one period. In the gradient descent 
direction, some atomic function increase while some others 
decrease. On the other hand, in the matrix completion problem, 
the objective function reaches zero at a global minimizer This 
implies that each atomic function reaches its minimum at a 




(a) Contours of f\. 



/i = 18 




Line search paths 



'U 



(b) Search paths with zooming in. 

Figure 1 : An illustrative example for barriers. 



global minimizer. That is, in a small neighborhood of a global 
minimizer, the atomic functions should be "consistent": there 
should exist a small e > such that when current estimate U 
is e-close to the global minimizer Ux, there is no atomic 
function reaching its maximum value along the path from 
current estimate U to the global minimizer Ux- Following 
this intuition, we have the following definition of barriers. 
Consider the geodesic path in Q starting from U, pointing in 
the direction H. Denote the unique minimizer and maximizer 
of the fc*'' atomic function by t„iin.fe and imax,fc (for constant 
atomic functions, we set tniin.fc — 't-ma^,k = 0). Refer to 
the atomic functions that decrease in the direction of H as 
consistent atomic functions. We say that the maximizer of the 
k*^ atomic function forms a barrier if 

1) In the H direction, there exists a consistent atomic 
function, say the atomic function, such that the 
maximizer of the fc*'' atomic function appears before the 
minimizer of the j*'* atomic function. That is, there ex- 
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istS j e [n] such that < tmax,fe < tminj < U 

2) The gradients of / at J7 (0) and U (imax.fc) are consis- 
tent (form a sharp angle), i.e., -j-J [U (t)) |t=t„„^ < 0. 
In Appendix |C] we describe how to decide whether 

ifiu{t))U=t„_,<o. 

Moreover, we say that the j"* column of admits barriers 
if there exists a fc G [n] such that the maximizer of the fc*'* 
atomic function forms a barrier and tmax,fc < iminj < ^maxj - 
Once barriers are detected, we transfer U. To avoid over- 
shooting, the transfer destination should be "e-close" to the 
barrier As e — ?> 0, the transfer destination is on the barrier 
(U (tmax.fe) for some fc). In our implementation, we focus on 
the "closest" barriers to U. Define 

= {j ■ the j*'' column of admits barriers} , (12) 



argmin tmiaj, and 



(13) 



fc* — argmax {tmax,fc : the maximizer of the k*^ atomic 

k 



function forms a barrier and t„ 



•}■ 



(14) 



We transfer our current estimation U (0) to U (tmax.fc*)- 

The subspace transfer part is a combination of barrier de- 
tection and column space transfer. It is described in Algorithm 

El 

Algorithm 3 Subspace transfer 
Input: Xq, Q, and U. 
Output: ttran and U (ttran)- 
Steps: 

1) Compute imaxj and iminj for each column j. 

2) Check whether there exist barriers. 

a) Find j* and fc* according to ( fT3b and ( fT4] l. respec- 
tively. 

b) Let ttran = ^max.fc' and compute U (ttran) accord- 
ing to (|5]l. 

3) If no barrier is detected (the set J' in ( fT2l i is empty), 
then ttran and J7 (tti-an) = U. 



F. Computation of tmin t^nd trnax 

The subspace transfer part of the SET algorithm relies 
on the minimizers and maximizers of atomic functions. This 
subsection presents the details for computing these extremals. 

Let U (t) be of the form in (|5]l. Also, let C [m] be an 
index set. Define 

Uii (t) = [Vn {ui cost + h sin t) , T'o (M2) , • • • , («r)] 
= cost + hn sint, U2,n, ■ ■ ■ ,Ur,n] ■ 

For a given vector x E W", denote Vn {x) by a;^- Define 

xn,r {t) ^xn-V {xn, Un it)) . 

The above expression simply specifies the projection residue 
vector of xu, where the projection is performed on the 



hyperplane span {Un (t)). Note that Xit^r (t) is a function of 
1*2,0, • • • , Ur^n- 

We would like to understand how a;n,r (t) changes with 
t. Note that 1*2,0, • • • , Wr.si do not change with t. We shall 
find an expression of x^.r {t) that does not directly include 
1*2. o, ■ ■ ■ , Mr,o- For this purpose, let 

x'^=xn-V {xii, [u2,n, • • • , Mr,n]) , 

Ur = Mi,o - V [1*2,0, • • • , itr,o]) , and 

hr=hQ,-V {ha, [M2,0, • • • , iir,o]) ■ 

Let 

Ur (t) = Ur COS t + hr sin t. 

According to Proposition [3] in Appendix iDl we have 

Xn,r {t) =Xr-V {x'r, Ur (t)) . 

Note that Ur (t) has a simpler form compared to U (t), and 
is therefore easier to analyze. 

According to Proposition [T] the function f^ Q (t) = 
\\xn.r (i)ll'^ is either a constant function or a periodic function 
with a unique maximizer and minimizer in one period tt. 
We are interested in computing the unique maximizer and 
minimizer, denoted by tmax and tmin respectively, when the 
function is not constant. Apply Proposition |2] in Appendix iDl 
the following procedure generates the values of t^ax and tmi„. 

1) Check whether 

a) the vectors Ur and hr are linearly dependent, or 

b) the vector Xr is orthogonal to both Ur and hr- 

If either of the above two properties holds, then f^.n (t) 
is a constant function. Set tmin = imax = and quit the 
procedure. 

2) Let 



c = 



Cl 
C2 



where the superscript f , as before, denotes the pseudoin- 
verse. Define a mapping 



atan 



\/2 

if X2 = 0, 
tan^i {X1/X2) 

if X2 7^ and xi/x2 > 0, 
TT — tan^^ {~xi/x2) 

if a;2 7^ and xi/x2 < 0. 



(15) 



Then 



imax = atan(c2,ci) . 
3) The minimizer tmin is computed via 

^min ~ atan (^Xr Ur, — Xr hr^ 
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IV. Performance Evaluation 

We tested the SET algorithm by randomly generating low- 
rank matrices X and index sets fl. Specifically, we decom- 
posed the matrix X into X = UxSxVx^ where Ux G 
Urn,r, Vx G Un^r, and Sx G W^'' . We generated Ux and 
Vx from the isotropic distribution on the set Um,r and Un,r, 
respectively. The entries of the Sx matrix were independently 
drawn from the standard Gaussian distribution N {Q, 1). This 
step is important in order to guarantee randomness in the 
singular values of X. The index set VI is also randomly 
generated according to a uniform distribution over the set 
{51' C [m] X [a] : = k}, for some constant k. 

The performance of the SET algorithm is excellent, when 
compared to the performance of other low-rank completion 
methods. We tested different matrices with different ranks 
and different sampling rates, defined as \U\ / {rax n). Fig. |2] 
illustrates the performance improvement due to the subspace 
transfer step. Significant gain is observed by integrating the 
subspace evolution and subspace transfer steps. Fig. [5] shows 
the performance of the SET algorithm for several choices of 
matrix sizes and ranks. We also compare the SET algorithm 
to other matrix completion algorithm^ As shown in Figure 
m the SET algorithm outperforms all other tested completion 
approaches. One unique property of the SET algorithm is that 
it works well in both high sampling rate and low sampling rate 
regimes: in the high sampling rate regime, the SET algorithm 
finds the unique low-rank solution; in the low sampling rate 
regime, it finds one of the possibly multiple low-rank solutions. 
Also note that there exists a region of sampling rates for which 
the SET algorithm (actually all tested algorithms) exhibits 
poor performance: the width and critical density of this region 
depends on the matrix dimension and rank, and this regions 
moves to the right as the rank increases. 

Finally, we would like to comment on the complexity of 
the SET algorithm. The computational complexity is related 
to the number of iterations required for convergence. Since it 
incorporates a gradient descent part, the SET algorithm inherits 
the general disadvantages of a gradient descent approach: the 
algorithm may take a large number of iterations to converge; 
within each iteration, finding the optimal step size can be 
time consuming. Furthermore, extra computations are required 
for the subspace transfer step. At the current stage, we do 
not have an accurate analytical estimate of the computational 
complexity. 



Appendix 

A. Proof of the form of the gradient in ([7| 

Let Fu be the m x r matrix of partial derivatives, i.e., 
{Fu)i j ~ df/dUij. We first write the objective function via 



Though the SVT algorithm is not designed to solve the problem (PO), we 
include it for completeness. In the standard SVT algorithm, there is no explicit 
constraint on the rank of the reconstructed matrix. For fair comparison, we 
take the best rank-r approximation of the reconstructed matrix, and check 
whether it satisfies the performance criterion. 





0.9 - 




0.8 - 


o 


0.7 - 


(5 


DC 




c 




o 
B 


0.6 - 


ij 




uJ 
c 


0.5 - 


o 




o 

0) 
(L 


0.4 - 


t3 
re 




X 
LU 


0.3 - 




0.2 - 




0.1 - 




- 








9-by-9 matrices, # of realizations=200 




X'X : 

-H — SET: rank=t : 

M SET: rank=2 
' '+ ' ' SE(no transfer step): rank=1 
' 'X ' ' SE(no transfer step): rank=2 



0.2 



0.4 0.6 
Sample Rate 



Figure 2: Performance improvement due to the subspace 
transfer step. 
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Figure 3: Performance of the SET algorithm. 



the trace function: 



/ = {Vn (Xn 

(a) 



(fc) 



UWu),Vn{Xn-UWu)) 
\X - UWu,V:, {Vn {Xn - UWu))) 
X-UWu,Vn {Xn-UWu)), 



trace {X - UWuY Vn {Xn - UWu) 



where the symbol Vq in (a) denotes the adjoint operator of 
Vn- Equation (a) follows from the definition of the adjoint 
operator, and equation {b) holds because the operator Vn is 
self-adjoint and idempotent. Note that 



df 
dU. 



df 



dU 



Wu 



E 

k.i 



df 



d{Wu) 



u 



dU, 
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Figure 4: Performance comparison. 

Since Wjj is the solution of the least square problem in (|6]l, 
we have 



Of 



Therefore, 

K 

dU 



= 0, for all 1 < fc < r and 1 < £ < 



u 



dU 



= -2Vn {Xn - UWu) = -2X,W^. 

According to |12, pg. 20], the corresponding tangent vector 
Vuf (with respect to the Grassmann manifold) is given by 
Vuf = Fu — UU'^ Fu- Since Wjj minimizes the Frobenius 
norm, it is straightforward to verify that U is orthogonal to 
Xr, i.e., U^Xr = 0. Therefore, Vf// = Fu = -2XrW^ 
which proves (|7]i. 

B. Proof of the decoupling principle in ( 17 OH 



Arbitrarily pick a J7 e 



For the matrix Xn, the 



objective function \\Xit — Vn {UW)\\ p is convex in W. 
Let be a global minimizer for this function. For each 

II 1 1 2 

column of X^, say x^., the function \\xi-i. - Vn- {UW;,j)\\ 
is also convex. Let W^^'^ now be the global minimizer for 



this j atomic function. Concatenate W\ 



(1) 



(1) 



mto 



a matrix and denote the resulting matrix by TV^^^ By the 
additivity of the squared Frobenius norm, the right side of 
([Toll becomes \\Xn - Vn {UW^^'^)\\\. By the definition of 



On the other hand. 



< 



|2 

\\f' 
\Xn 



|2 

If' 



Xn - Vn ( C/VF(i) 



< 



"2 

E 
E 



\Xr 



This proves equation ( flOb . 

C. Determination of Consistency 

Let G = Vt//|f/(t_^^ ) be the gradient of / at {7 (tmax./c)- 
It can be computed via Q. Consider the geodesic curve in (|5]). 
Define 

Hit) = [-Misint + /icosi,0, • • • ,0] forte [0,7r). 

It can be shown that H (imax.fc) is the parallel transportation 
of H at tmax,fc (see [12. pg. 19] for more details). Based on the 
definition of the gradient, it can be shown that j-J (U (t)) < 
if and only if 



k)) — G^H (tmax,fe) < 0. 



D. Proof of Proposition |7] 

This subsection presents the proof of Proposition [T] and the 
mechanism in Section UlI-FI for computing ^niax and imin- We 
first study the case r = 1 and then extend the results to the 
general case where r > 1. 

In the rank-one case, the geodesic curve has the 
form U (t) = ucost + hsint, with t e [0,7r). For 
some n C [to], an atomic function can be written as 
\\xn - V {xn,un cost + hn sin t)\\'^, where un = Vniu) 
and hn — Vn (h). Note that un may not be of unit norm. 
For notational convenience, we drop the subscript fl. The 
following proposition describes the general behavior of an 
atomic function. 

Proposition 2: Let y,Ui,U2 G M™. Suppose that 

1) The vectors ui and U2 are linearly independent. 

2) The vector y is not orthogonal to both ui and U2 
simultaneously. 

Let u (t) — Ui cost + U2 sint where t e K. Define yr (t) — 

\yr {t)\\'^. Then the following is 



y-V{y,u {t)) and / [t) 
true. 



1) / (t) is a periodic function with period tt. 

2) / it) has a unique minimizer tniin and a unique maxi- 
mizer tmax- 

3) The maximizer ^niax defined in |2]i can be com- 
puted in the following way. Let c = [ci,C2]^ — 
coeff (y, M2])- Then t^^ax — atan (c2, ci), where the 
atan function is defined in (flST l. 

4) The minimizer tmin defined in|2]i is computed via tmin = 



atan iy^Ui 



Proof: This first part is proved by observing that 
u{t + tt) = u (t). Note that for a given t, 



{t)=y-[y^u{t)/\\u{t)f)uit). 



One has 

Ur {t + n) = y- [y'^u {t + Ti)/\\u{t + 7r)||^) u{t + Ti) 
^y-(-y^u{t)/\\u{t)\\') i-uit)) 

= Vr (t) ■ 

The other claims of this proposition are proved as follows. 
By assumption, Ui and U2 are linearly independent. As a 
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result, span ([«!, M2]) is a hyperplane with dimension two. It 
is clear that u (t) = Ui cost + U2 s'mt 7^ for all t G M and 
it forms an ellipse on the hyperplane span 1*2]) centered 
at 0. Any line in the hyperplane span ([1x1, M2]) through the 
origin can be uniquely represented by a point on the half 
ellipse u{t) with t £ [0,7r): that is, for all unit vector 
u' e span ([«!, 1x2]), there exists a unique t E [0,7r) and an 
s S R such that u's = u{t). In other words, the half ellipse 
u{t) with t € [0,7r) presents all possible lines (through the 
origin) in the hyperplane span M2])- 

Let Up be the projection of y on the hyperplane 
span ([mi, M2]), i.e., yp = proj (y, [tti, M2]). It is clear that 
/ [t) is maximized when u (t) is aligned with yp-. this means, 
there exists a constant c e K such that u (t) ~ cyp. By 
the definition of the projection, we have yp — [ui,U2] c — 
Mici + tt2C2. Therefore, tmax = atan (c2, ci). 

The function / (t) is minimized when u (t) is orthogonal 
to y. We have y^iticosiniin + y^M2sintniin = 0. Solving 
this equation proves part 4. 

We prove the uniqueness results next. By assumption, y is 
not orthogonal to both Ui and U2 simultaneously. Hence, yp 7^ 
0. Furthermore, since Ui and U2 are linearly independent, the 
vector yp is uniquely defined. This establishes the uniqueness 
of imax- Since the dimension of the hyperplane span ([tti, M2]) 
is two, there exists a unique line in span ([mi, M2]) to be 
orthogonal to yp E span 112]). We denote this line 
by a vector y± 7^ 0, such that y± e span M2]) and 
y±yp = 0. First, y± is orthogonal to y. This can be easily 
verified as y = yp + yr, where is the projection residue 
vector and therefore is orthogonal to y± as well. Second, any 
linear combination of y± and yp such that the coefficient of 
yp is nonzero produces a line that is not orthogonal to y. 
Therefore, y± represents the unique line in span ([tti, M2]) that 
is orthogonal to y. The corresponding value imin is therefore 
unique. ■ 

We proceed next with the general case where r > 1. Recall 
the expression for the geodesic curve in (|5]l. Denote Vn (h) 
by ha. Similarly, we have • • • ,Mr,fi- Let Mi,o (i) = 

111 n cost + fiQ sini. The atomic function can be written as 

/ (t) ^ \\xn - V {xn, [mi,o (t) ,U2,n, ■ ■ ■ ■ 

Again we drop the subscript ft for convenience. The following 
proposition is the key to understand the relationship between 
V {x, ui (t)) and V {x, [ui (t) , M2, • ' ' , Wr])- 

Proposition 3: Let y g R", Ui G R^xm ^j^^j jy^ ^ 
]gmx«2 ^^^iiere ni,n2 G [m]. Let 

yr = y-V{y,[Ui,U2]). 

Denote the j*'' column of U2 by {U2).y Then can be 
written as 

yr = yrS - V (yr,l, U2^r) , 

where y^i = V {y,Ui), and U2r = 
[iU2).,, - V\{U2).,, , J7i) , • • • , {U2).,, - V {{U2).,, ;C/i)]. 

Proof: The proof is centered around the notion of pro- 
jection. For arbitrary y G R™ and U G R™^", an operator 
is a projection operator if and only if V (y, U) G span (U) 



and yr -L U, where y^ = y — V (y, U). We say y^ -L U if 
y^U.,j = for all j G [n]. 

Let y' = yr,i — V (yr,i, U2.r)- To prove this proposition, 
it suffices to show that yj, _L \Ui,U2] and y — yl G 
span([(7i,J72]). 

We first show that y(, _L [Ui, U2]- That yj, _L Ui is verified 
as follows. Since 7^ (yr.i, C^2,r) G span ((72. r) and each 
column of U2.r is orthogonal to Ui, we have V (yr.i, C^2,r) -L 
Ui. The definition of y^^i implies that y^^i -L Ui. Hence, 
we have y^ _L Ui as the vector yj, is a linear combination 
of y,, 1 and V {yr,i,U2.r)- We claim that y'^ _L U2 as well. 
According to the definition of y^, it is clear that y^ _L U2.r- 

Note that ((72)^^- - ([/2.,),j + P (((72)^^- , iJi) . The vector 

V (^{U2).j ,Ui^ is in the span (C/i) and therefore orthogonal 
to y^. As a result, yj, _L (72. We then have y^ _L [Ui, 1/2]- 
Next, we show that y — yj, G span ([Ci, [/2]). Note that 

y y'r = y~ yr,i + v (y^.i, U2,r) 

= V{y,Ui)+V{yrA,U2,r). 

Clearly, V (y,Ui) G span((7i) C span ([(7i, (72]). Further- 
more, according to the definition of U2.r, span ((72. r) C 
span ([J7i, (72]) and therefore 7^ (yr,i, C/2,r) G span ((72, r) C 
span ([(7i, (72]). This completes the proof. ■ 
Based on the claim of this proposition, one can to apply the 
analysis for the rank-one case (Proposition |2|i to higher-rank 
cases. Let (7^i = [tt2, • • • , Ur], and let a;,. = x — V {x, CT^i). 
Similarly, define Mi and hr- It is clear that 

Ui,r{t)^Ui {t)~V{Ui {t),U^i) 

= Ui cost + hsint 

- V {ui,U^i) cost-V {h, U^i) sint 
= Ui^r cos t + hr sin t. 

One has 

X -V {X, [Ui (t) ,U2, - ■ ■ , Ur]) 
— Xr — V {Xr, Ml,r (t)) ■ 

This establishes the connection between the rank-one case 
and the general case, proves Proposition [T] and justifies the 
procedure in Section IIII-FI for computing minimizers and 
maximizers. 
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